Skip to content

fix: escape regex metacharacters in extract_people and sanitize session_id in precompact hook#438

Open
adiKhan12 wants to merge 2 commits intoMemPalace:developfrom
adiKhan12:fix/input-sanitization
Open

fix: escape regex metacharacters in extract_people and sanitize session_id in precompact hook#438
adiKhan12 wants to merge 2 commits intoMemPalace:developfrom
adiKhan12:fix/input-sanitization

Conversation

@adiKhan12
Copy link
Copy Markdown

What does this PR do?

Fixes two input sanitization issues:

  1. Regex injection in extract_people() (split_mega_files.py:144) — Names from user config (~/.mempalace/known_names.json) are interpolated into a regex pattern without re.escape(). Names containing regex metacharacters (e.g. Dr. Smith, C++, Mary (Smith)) cause false matches or crashes. The rest of the codebase already uses re.escape() for this — entity_detector.py:471 and entity_registry.py:602 — this was the one place that missed it.

  2. Missing session_id sanitization in precompact hook (mempal_precompact_hook.sh) — The save hook sanitizes session_id via a safe() lambda that strips non-alphanumeric characters. The precompact hook was missing this sanitization entirely, using raw JSON input. This is part of the hooks hardening referenced in the README (Shell injection risk in hook scripts ($TRANSCRIPT_PATH / $SESSION_ID) #110).

How to test

python -m pytest tests/test_split_mega_files.py tests/test_hooks_cli.py -v

Key new tests:

  • test_extract_people_escapes_regex_metacharacters — verifies Dr. Smith matches exactly (not Dr_ Smith)
  • test_extract_people_no_crash_on_regex_metacharacters — verifies names like C++, [Admin], A*B don't crash
  • test_precompact_sanitizes_session_id — verifies path traversal in session_id is stripped

Checklist

  • Tests pass (python -m pytest tests/ -v)
  • No hardcoded paths
  • Linter passes (ruff check .)

Copy link
Copy Markdown

@web3guru888 web3guru888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean security fix. The re.escape() on KNOWN_PEOPLE names is necessary — without it, names like "Dr. Smith" match unintended patterns (Dr_Smith, DrXSmith, etc.) and names with regex metacharacters (C++, brackets) crash outright. Good catch.

The session_id sanitization in the precompact hook mirrors the existing save hook pattern, which is the right approach — both hooks now apply the same [^a-zA-Z0-9_\-] filter. The test for path traversal (../../etc/passwd) covers the critical case.

The test that verifies no crash on regex metacharacters (test_extract_people_no_crash_on_regex_metacharacters) is a good regression guard. 👍

@adiKhan12
Copy link
Copy Markdown
Author

Thanks for the review @web3guru888! Glad the approach looks solid.

The inconsistency between entity_detector.py/entity_registry.py (which already used re.escape()) and split_mega_files.py (which didn't) is what initially flagged this — same pattern, one missed spot.

For the precompact hook, I matched the save hook's safe() lambda exactly so both hooks stay in sync going forward. Happy to adjust anything if maintainers have feedback.

@web3guru888
Copy link
Copy Markdown

Makes sense — keeping both hooks in sync is the right call. The consistency across save/precompact hooks will make future changes easier to spot if the pattern ever needs updating.

…on_id in precompact hook

- Add re.escape() to extract_people() in split_mega_files.py to prevent
  false matches and crashes when known names contain regex metacharacters
  (e.g. "Dr. Smith", "C++"). Matches the pattern already used in
  entity_detector.py and entity_registry.py.

- Add session_id sanitization to mempal_precompact_hook.sh, matching the
  safe() pattern already present in mempal_save_hook.sh. Previously the
  precompact hook used raw unsanitized session_id from JSON input.

- Add tests covering regex metacharacter edge cases and precompact
  session_id sanitization.
@adiKhan12 adiKhan12 force-pushed the fix/input-sanitization branch from 41e43b5 to d71af22 Compare April 10, 2026 18:02
@bensig bensig changed the base branch from main to develop April 11, 2026 22:22
@bensig bensig requested a review from igorls as a code owner April 11, 2026 22:22
@igorls igorls added area/hooks Claude Code hook scripts (Stop, PreCompact, SessionStart) area/mining File and conversation mining bug Something isn't working security Security related labels Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/hooks Claude Code hook scripts (Stop, PreCompact, SessionStart) area/mining File and conversation mining bug Something isn't working security Security related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants